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Abstract 

Calcium imaging for observing spiking activity from large populations of neurons are quickly gaining popularity. 
While the raw data are fluorescence movies, the underlying spike trains are of interest. This work presents a fast non- 
negative deconvolution filter to infer the approximately most likely spike train for each neuron, given the fluorescence 
observations. This algorithm outperforms optimal linear deconvolution (Wiener filtering) on both simulated and 
biological data. The performance gains come from restricting the inferred spike trains to be positive (using an interior- 
point method), unlike the Wiener filter. The algorithm is fast enough that even when imaging over 100 neurons, 
inference can be performed on the set of all observed traces faster than real-time. Performing optimal spatial filtering 
on the images further refines the estimates. Importantly, all the parameters required to perform the inference can 
be estimated using only the fluorescence data, obviating the need to perform joint electrophysiological and imaging 
calibration experiments. 

1 Introduction 

Simultaneously imaging large populations of neurons using calcium sensors is becoming increasingly popular IfTJ, 
both in vitro dEI and in vivo |0]|5]|6l, and will likely continue as the signal-to-noise-ratio (SNR) of genetic sensors 
continues to improve GHHED. Whereas the data from these experiments are movies of time-varying fluorescence 
traces, the desired signal consists of spike trains of the observable neurons. Unfortunately, finding the most likely 
spike train is a challenging computational task, due to limitations of the SNR and temporal resolution, unknown 
parameters, and computational intractability. 

A number of groups have therefore proposed algorithms to infer spike trains from calcium fluorescence data using 
very different approaches. Early approaches simply thresholded dF/F (e.g., ifTOlfTTl ) to obtain "event onset times." 
More recently, Greenberg et al. [ 12 1 developed a template matching algorithm to identify individual spikes. Holekamp 
et al. [ 13 1 then applied an optimal linear deconvolution (ie, the Wiener filter) to the fluorescence data. This approach 
is natural from a signal processing standpoint, but does not utilize the knowledge that spikes are always positive. 
Sasaki et al. lfT4l proposed using machine learning techniques to build a nonlinear supervised classifier, requiring 
many hundreds of examples of joint electrophysiological and imaging data to "train" the algorithm to learn what 
effect spikes have on fluorescence. Vogelstein et al. IT511 proposed a biophysical model-based sequential Monte Carlo 
method to efficiently estimate the probability of a spike in each image frame, given the entire fluorescence time-series. 
While effective, that approach is not suitable for online analyses of populations of neurons, as the computations run 
in about real-time per neuron (ie, analyzing one minute of data requires about one minute of computational time on a 
standard laptop computer). 

The present work starts by building a simple model relating spiking activity to fluorescence traces. Unfortu- 
nately, inferring the most likely spike train given this model is computationally intractable. Making some reasonable 
approximations leads to an algorithm that infers the approximately most likely spike train, given the fluorescence 
data. This algorithm has a few particularly noteworthy features, relative to other approaches. First, spikes are as- 
sumed to be positive. This assumption often improves filtering results when the underlying signal has this property 
|[T6j |T7j [T8l [T9l [20j [2TJ [22l [23j . Second, the algorithm is fast: it can process a calcium trace from 50,000 images in 
about one second on a standard laptop computer. In fact, filtering the signals for an entire population of about 100 
neurons runs faster than real-time. This speed facilitates using this filter online, as observations are being collected. In 
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addition to these two features, the model may be generalized in a number of ways, including incorporating spatial fil- 
tering of the raw movie. The efficacy of the proposed filter is demonstrated on several biological data-sets, suggesting 
that this algorithm is a powerful and robust tool for online spike train inference. The code (which is a simple Matlab 
script) is available from the authors upon request. 



2 Methods 

2.1 Data driven generative model 

FigureQ]shows data from a typical in vitro epifiuorescence experiment (see section l2~7l for data collection details). The 
top panel shows the mean frame of this movie, including 3 neurons, two of which are patched. To build the model, 
the pixels within a region-of-interest (ROI) are selected (white circle). Given the ROI, all the pixel intensities of each 
frame can be averaged, to get a one-dimensional fluorescence time-series, as shown in the bottom left panel (black 
line). By patching onto this neuron, the spike train can also be directly observed (black bars). Previous work suggests 
that this fluorescence signal might be well characterized by convolving the spike train with an exponential, and adding 
noise [ 1 1. This model is confirmed by convolving the true spike train with an exponential (gray line, bottom left panel), 
and then looking at the distribution of the residuals. The bottom right panel shows a histogram of the residuals (solid 
line), and the best fit Gaussian distribution (dashed line). 
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Figure 1 : Typical in vitro data suggest that a reasonable first order model may be constructed by convolving the spike 
train with an exponential and adding Gaussian noise. Top panel: the average (over frames) of a typical field-of-view. 
Bottom left: true spike train recorded via a patch electrode (black bars), convolved with an exponential (gray line), 
superimposed on the fluorescence trace (OGB- 1 ; black line). While the spike train and fluorescence trace are measured 
data, the calcium is not directly measured, but rather, inferred. Bottom right: a histogram of the residual error between 
the gray and black lines from the bottom left panel (dashed line), and the best fit Gaussian (solid line). Note that the 
Gaussian model provides a good fit for the residuals here. 
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The above observations may be formalized as follows. Assume there is a one-dimensional fluorescence trace, F 
(throughout this text X indicates the vector [X\, . . . , Xt], where T is the index of the final frame), from a neuron. 
At time t, the fluorescence measurement Ft is a linear-Gaussian function of the intracellular calcium concentration at 
that time, [Ca 2+ ] t : 

F t =a([Ca 2+ ]i+/3) + ( j£ t , e t ~AA(0,l). (1) 

The scale, a, absorbs all experimental variables impacting the scale of the signal, including the number of sensors 
within the cell, photons per calcium ion, amplification of the imaging system, etc. Similarly, the offset, (3, absorbs 
the baseline calcium concentration of the cell, background fluorescence of the fiuorophore, imaging system offset, etc. 
The standard deviation, a, results from calcium fluctuations independent of spiking activity, fluorescence fluctuations 
independent of calcium, and imaging noise. The noise at each time, e t , is independently and identically distributed 
according to a standard normal distribution (ie, Gaussian with zero mean and unit variance), as indicated by the 

notation ~ Af(0, 1). 

Then, assuming that the intracellular calcium concentration, [Ca 2+ ] t , jumps by A fjM after each spike, and subse- 
quently decays back down to Cb fjM with time constant r, one can write: 

[Ca 2 +] t+1 = (1 - A/r)[Ca 2+ ] t + (A/r)C b + An t (2) 

where A is the time step size — which is the frame duration, or 1 / (frame rate) — and n t indicates the number of times 
the neuron spiked in frame t. Note that because [Ca 2+ ]j and F t are linearly related to one another, the fluorescence 
scale, a, and calcium scale, A, are not identifiable. In other words, either can be set to unity without loss of generality, 
as the other can absorb the scale entirely. Similarly, the fluorescence offset, (3, and calcium baseline, Cb are not 
identifiable, so either can be set to zero without loss of generality. Finally, letting 7 = (1 — A/r), Eq. (O can be 
rewritten replacing [Ca 2+ ] t with its non-dimensionalized counterpart, C t : 

C t+ i=iC t + n t . (3) 

Note that Ct does not refer to absolute intracellular concentration of calcium, but rather, a relative measure (see lfT31l 
for a more general model). The gray line in the bottom left panel of Figure Q] corresponds to the putative C of the 
observed neuron. 

To complete the "generative model" (ie, a model from which simulations can be generated), the distribution from 
which spikes are sampled must be defined. Perhaps the simplest first order description of spike trains is that at each 
time, spikes are sampled according to a Poisson distribution with some rate: 

n t ~ Poisson(AA) (4) 

where AA is the expected firing rate per bin, and A is included to ensure that the expected firing rate is independent 
of the frame rate. Thus, Eqs. (Q]), (f3]), and (0]i complete the generative model. 

2.2 Goal 

Given the above model, the goal is to find the maximum a posteriori (MAP) spike train, i.e., the most likely spike 
train, n, given the fluorescence measurements, F: 

n = argmax P[n | F], (5) 

n t £N Vt 

where P[n|.F] is the posterior probability of a spike train, n, given the fluorescent trace, F, and n t is constrained to 
be an integer, No = {0, 1,2,.. .}, because of the above assumed Poisson distribution. From Bayes' rule, the posterior 
can be rewritten: 

P\n F] 1 

P[n\F] = -L^i = P[F\n]P[n], (6) 
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where P[F] is the evidence of the data, P[Jf|n] is the likelihood of observing a particular fluorescence trace F, given 
the spike train n, and P[n] is the prior probability of a spike train. Plugging the far right-hand-side of Eq. (|6]l into 
Eq. (0, yields: 

n = argmax — - — -P[.F|n]P[Ti] = argmax P[.F|n]P[Ti], (7) 

n t GN Vt P[F\ ri t GNoVt 

where the second equality follows because P[F] merely scales the results, but does not change the relative quality of 
various spike trains. Both P[P|n] and P[n] are available from the above model: 

T 

P[F\n] = P[F\C] = l[P[F t \C t ], (8a) 

t=i 

T 

P[n] = Y[P[nt], (8b) 

t=i 

where the first equality in Eq. d8at follows because C is deterministic given n, and the second equality follows from 
Eq. (Q]). Further, Eq. d8bl follows from the Poisson process assumption, Eq. (0]). Both P[P t |Ct] and P[n t ] can be 
written explicitly: 

P[F t \C t ] =Af(a(C t +(3),a 2 ), (9a) 
P[n t ] = Poisson(AA), (9b) 

where both equations follow from the above model. Now, plugging Eq. (O back into ([S]), and plugging that result into 
Eq. yields: 

A 1 / 1 {Ft ~ o-{Ct + P)Y \ exp{-AA}(AA)"' 

n = argmax , exp < — - > ■ (10a) 

n^NoVt^VW p \ 2 a 2 J n t ! 

= argmax V \-^{F t - a(C t + (3)f + n t log AA - logroll , (10b) 

where the second equality follows from taking the logarithm of the right-hand-side and dropping terms that do not 
depend on n. Unfortunately, solving Eq. fllObl ) exactly is computationally intractable, as it requires a nonlinear search 
over an infinite number of possible spike trains. The search space could be restricted by imposing an upper bound, k, 
on the number of spikes within a frame. However, in that case, the computational complexity scales exponentially with 
the number of image frames — i.e., the number of computations required would scale with k T — which for pragmatic 
reasons is intractable. 



2.3 Inferring the most likely spike train, given a fluorescence trace 

The goal here is to develop an algorithm to efficiently approximate n, the most likely spike train, given the fluorescence 
trace. Because of the computational intractability described above, Eq. ( fTOb is approximated by modifying Eq. (O, 
replacing the Poisson distribution with an exponential distribution of the same mean. Modifying Eq. ( fTOb to incorporate 
this approximation yields: 

n w argmax TT | —= exp 
n t >ovt fJi I V2na 2 

T 1 

= argmax V- — (F t - a(C t +/3)) 2 - n t XA (lib) 
« t >ovt fr{ 2a z 

where the constraint on n t has been relaxed from n t £ No to n t > (since the exponential distribution can yield 
any non-negative number). The advantage of this approximation is that the optimization problem becomes concave 
in C, meaning that any gradient ascent method guarantees achieving the global maximum (because there are no local 
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maxima, other than the single global maximum). To see that Eq. 
n t = Ct — jCt-i, so Eq. d 1 lbb can be rewritten: 



T 

argmax N 

C t -7C t _i>0V*^ 



"2^ 



1 lbb is concave in C, rearrange Eq. (0 to obtain, 



(12) 



a(C t +f3)) 2 - (C t - 7 C t _i)AA 



which is a sum of terms that are concave in Ct, so the whole right-hand-side is concave. Unfortunately, the integer con- 
straint has been lost, i.e., the answer could include "partial" spikes. This disadvantage can be remedied by thresholding 
(ie, setting n t = 1 for all n t greater than some threshold, and the rest setting to zero), or by considering the magnitude 
of a partial spike at time t as an indication of the probability of a spike occurring during frame t. Note that replacing a 
Poisson with an exponential is a common approximation technique in the machine learning literature Il24l[23l , as the 
exponential distribution is the closest log-concave relaxation to its non-log-concave counterpart, the Poisson distribu- 
tion. More specifically, the probability mass function of a Poisson distributed random variable with low rate is very 
similar to the probability density function of a random variable with an exponential distribution. While this convex 
relaxation makes the problem tractable, the "sharp" threshold imposed by the non-negativity constraint prohibits the 
use of standard gradient ascent techniques. This may be rectified by dropping the sharp threshold, and adding a bar- 
rier term which must approach — oo as nt approaches zero (this approach is often called an "interior-point" method). 
Iteratively reducing the weight of the barrier term guarantees convergence to the correct solution. Thus, the goal is to 
efficiently solve: 



C 2 = argmax 



1 

2^2 



(Ft - a(C t + P)f - {C t - tQ-i)AA + z\og(C t - ~fC t -i) 



(13) 



where log(-) is the "barrier term", and z is the weight of the barrier term. Iteratively solving for C z for z going 
from one down to nearly zero, guarantees convergence to C J24). The concavity of Eq. ( fT3l facilitates utilizing any 
number of techniques guaranteed to find the global maximum. Because the argument of Eq. (fT3l is twice analytically 
differentiable, one can use the Newton-Raphson technique ll25l . The special tridiagonal structure of the Hessian 
enables each Newton-Raphson step to be very efficient (as described below). To proceed, Eq. ( fT3l l can be rewritten in 
matrix notation. Note that: 
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where M £ 1 ) xT is a bidiagonal matrix. Then, letting 1 be a T — 1 dimensional column vector and A 
yields the objection function, Eq. ( fT3l . in more compact matrix notation: 



C z = argmax - 

MOO 



1 

2^2 



\F-a(C 



■ml 



(MC) T A + zlog(MC) T l, 



(14) 



AAl 



(15) 



where MC > indicates that every element of MC is greater than or equal to zero, T indicates the transpose 
operation, log(-) indicates an element-wise logarithm, and ||x|| 2 is the standard L 2 norm, i.e., ||x|| 2 = J^i x i- When 
using Newton-Raphson to ascend a surface, one iteratively computes both the gradient (first derivative) and Hessian 
(second derivative) of the argument to be maximized, with respect to the variables of interest (C here). Then, the 
estimate is updated using C z «— C z + sd, where s is the step size and d is the step direction obtained by solving 
Hd = g. The gradient, g, and Hessian, H, for this model, with respect to C, are given by: 



g = --(F-a(C T 

2 

H = %I + zM T (MC)- 2 M 



13)) + M T A - zM T (MC)- 1 



(16a) 
(16b) 



where the exponents on the vector MC indicate element-wise operations. The step size, s, is found using "backtrack- 
ing linesearches", which finds the maximal s that increases the posterior and is between zero and one l25ll . 



5 



Vogelstein JT, et al 



Fast spike train inference from calcium imaging 



Typically, implementing Newton-Raphson requires inverting the Hessian, i.e., solving d = H g, a computation 
that scales cubically with T (requires on the order of T 3 operations). Already, this would be a drastic improvement over 
the most efficient algorithm assuming Poisson spikes, which would require k T operations (where k is the maximum 
number of spikes per frame). Here, because M is bidiagonal, the Hessian is tridiagonal, so the solution may be found 
in about T operations, via standard banded Gaussian elimination techniques (which can be implemented efficiently in 
Matlab using H\g, assuming H is represented as a sparse matrix) ll23ll . In other words, the above approximation and 
inference algorithm reduces computations from exponential time to linear time. Appendix|A]contains pseudocode for 
this algorithm, including learning the parameters, as described below. 



2.4 Learning the parameters 

We assumed above that the parameters governing the model, = {a, (3, a, 7, A}, were known, but in practice they 
are typically unknown. An algorithm to estimate the most likely parameters, 6, could proceed as follows: (i) initialize 
some estimate of the parameters, 6, then (ii) recursively compute n using those parameters, and update given the 
new n, until some convergence criteria is met. Below, details are provided for each step. 

2.4.1 Initializing the parameters 

Because the model introduced above is linear, the scale of F relative to n is arbitrary. Therefore, before filtering, F 
is linearly "squashed" between zero and one, ie F <— (F — F m i n )/(F max — F m i n ), where F m i n and F max are the 
observed minimum and maximum of F, respectively. Given this normalization, a is set to one. Because spiking is 
assumed to be sparse, F tends to be around baseline, so (3 is initialized to be the median of F , and a is initialized 
as the median absolute deviation of F, i.e., a — median t (|F t — median s (F s )|)/_ftT, where median^ATj) indicates the 
median of X with respect to index i, and K — 1.4785 is the correction factor when using median absolute deviation 
as a robust estimator of the standard deviation. Because in these data, the posterior tends to be relatively flat along 
the 7 dimension, i.e., large changes in 7 result in relatively small changes in the posterior, estimating 7 is difficult. 
Further, previous work has shown that results are somewhat robust to minor variations in time constant [26]; therefore 
7 is initialized at 1 — A/ (lsec), which is fairly typical |27|. Finally, A is initialized at 1 Hz, which is between typical 
baseline and evoked spike rate for these data. 

2.4.2 Estimating the parameters given n 

Ideally, one could integrate out the hidden variables, to find the most likely parameters: 

= argmax / P[F, C\0]dC = argmax / P[F\C; 6]P[C\0]dC. (17) 
9 J 9 J 

However, evaluating those integrals is not particularly tractable. Therefore, Eq. ( fT7T > is approximated by simply maxi- 
mizing the parameters given the MAP estimate of the hidden variables: 

8 w argmaxP[F,C|0] = argmax P[F\C; 8]P[n\d] = argmax{logP[F|C; 6} + logP[n|0]}, (18) 
9 9 

where C and n are determined using the above described inference algorithm. The approximation in Eq. ( fT8l ) is good 
whenever most of the mass in the integral in Eq. ( fl~8l > is around the MAP sequence, The argument from the 
right-hand-side of Eq. ( TT~8b may be expanded: 

T T 

logP[F\C;e]+logP[n\e}=^OEP[Ft\C t \a,0,a]+^ogP[n t \X\. (19) 

t=l i=l 

Note that the two terms in the right-hand-side of Eq. (fT9b may be optimized separately. The maximum likelihood 
estimate (MLE) for the observation parameters, {a, (3, a}, is therefore given by: 



1 ( F t -a(C t +p)\ 2 



= argmax V log P[F t | C t ;(3,a] = argmax -i(27rcr 2 ) - i 

a,f3,a>0 TT^ /3,cr>0 * * \ O 



(20) 



'Eq. {T8j may be considered a first-order Laplace approximation 1281 . 



6 



Vogelstein JT, et al 



Fast spike train inference from calcium imaging 



Note that a rescaling of a may be offset by a complementary rescaling of C and /3. Therefore, because the scale of C 
is arbitrary, a can be set to one without loss of generality. Plugging a = 1 into Eq. d20l l, and maximizing with respect 
to /? yields: 

T 

/3 = argmaxV-( J F 1 t -(O t +/3)) 2 . (21) 

Computing the gradient with respect to /3, setting the answer to zero, and solving for /3, yields /3 = A ^2 t {F t — Ct). 
Similarly, computing the gradient of Eq. (l20t with respect to a, setting it to zero, and solving for a yields: 

5 = ^Y,(Ft-C t -0) 2 , (22) 

which is simply the root-mean-square of the residual error. Finally, the MLE of A is given by solving: 

A = argmax VVlog(AA) - n t AA), (23) 
a>o l 

which, again, computing the gradient with respect to A, setting it to zero, and solving for A, yields A = T/(A J^t ^*)> 
which is the inverse of the inferred average firing rate. 

Iterations stop whenever (i) the iteration number exceeds some upper bound, or (ii) the relative change in likelihood 
does not exceed some lower bound. In practice, parameters tend to converge after several iterations, given the above 
initializations. 



2.5 Spatial filtering 

In the above, we assumed that the raw movie of fluorescence measurements collected by the experimenter had under- 
gone two stages of preprocessing before filtering. First, the movie was segmented, to determine regions-of-interest 
(ROIs), yielding a vector, F t = (Fi t, ■ ■ ■ , Fn ,t), which corresponded to the fluorescence intensity at time t for each 
of the N p pixels in the ROI. Second, at each time t, that vector was projected into a scalar, yielding Ft, the assumed 
input to the filter. In this section, the optimal projection is determined by considering a more general model: 

F x ,t=a x (C t + P)+(JE x , u Ex ,f~^(0,l) (24) 

where a x scales each pixel, from which some number of photons are contributed due to calcium fluctuations, Ct, and 
others due to baseline fluorescence, (3. Further, the noise is assumed to be both spatially and temporally white, with 
standard deviation, a, in each pixel (this assumption can easily be relaxed, by modifying the covariance matrix of 
e x .t). Performing inference in this more general model proceeds in a nearly identical manner as before. In particular, 
the maximization, gradient, and Hessian become: 



1 

argmax j 

MOO 2(7 



F - a(C T + /3 T ) - (MC) T A + z log(MC) T l, (25) 

F 

g = ^(F - a(C T + (3)) - M T X + zM T (MCy 1 (26) 

H = -^-1 - zM T (MCy 2 M (27) 



where F is an N p x T element matrix, a is a column vector of length N p , (3 — (31t, where It is a column vector of 
ones with length T, I is an N p x N p identity matrix, and ||x|| F indicates the Frobenius norm, i.e. \\x\\ F = Yli j x l j- 
Note that to speed up computation, one can first project the N c x T dimensional movie onto the spatial filter, d, 
yielding a one-dimensional time series, F, reducing the problem to evaluating a T x 1 vector norm, as in Eq. ([TBI l. 

Typically, the parameters a and j3 are unknown, and therefore must be estimated from the data. Following the 
strategy developed in the last section, we first initialize the parameters. Let the initial spatial filter be the median 
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image frame, i.e., a x = median t (F xt ), and the initial offset be the total movie median, (3 — median xt {F X}t ). Given 
these robust initializations, the maximum likelihood estimator for each a x is given by: 

a x = argmax P[F X \C] = argmax log P [F Xjt \ Ct] (28a) 
= argmax V (-I^Tra 2 ) - -L f Fx , t - a x {C t + = argmax V ~(F x . t - a x (C t - (3)) 2 , (28b) 

«i t L ^ Z(T \ /J ctx t 

which is solved by regressing (C + (3) onto f x . In other words, by computing the gradient of Eq. ( I28bb with respect 

to a x and setting to zero, one obtains (using Matlab notation): a x = (C ' + f3)\F x . Computing the gradient of Eq. d28l ) 
with respect to (3, setting the result to zero, and solving for 0, yields: 

1 ly P t =l x=l 

Iterating these two steps results in a coordinate ascent approach to estimate a and (3 [24 1. As in the scalar F t case, we 
iterate estimating the parameters of this model, = {a, (3, a, 7, A}, and the spike train, n. Because of the free scale 
term discussed in section [2~4l the absolute magnitude of a is not identifiable. Thus, convergence is defined here by the 
"shape" of the spike train converging, i.e., the norm of the difference between the inferred spike trains from subsequent 
iterations, both normalized such that max(n t ) = 1. In practice, this procedure converged after several iterations. 



2.6 Overlapping spatial filters 

It is not always possible to segment the movie into pixels containing only fluorescence from a single neuron. Therefore, 
the above model can be generalized to incorporate multiple neurons within an ROI. Specifically, letting the superscript 
i index the N c neurons in this ROI yields: 

F t = Y j a l {C l t +f3 l ) + ae u 

i=l 

C\ = 7*C^_! + n\, n\ ~ Poisson(nj; A, A) (31) 

where each neuron is implicitly assumed to be independent, and each pixel is conditionally independent and identically 
distributed with standard deviation a, given the underlying calcium signals. To perform inference in this more general 
model, let 1 and correspond to an N c x 1 row vector of ones, and zeros, respectively, and n t = [n\ , . . . , nf c ], 
C t = [Cl Cf=], and T = [-7 1 , - 7 ^] T , yielding: 



MC = 



and proceed as above. Note that Eq. d32l is very similar to Eq. ( fT4b , except that M is no longer tridiagonal, but 
rather, block tridiagonal (and Ct and n t are vectors instead of scalars). Importantly, the Thomas algorithm, which is a 
simplified form of Gaussian elimination, finds the solution to linear equations with block tridiagonal matrices in linear 
time, so the efficiency gained from utilizing the tridiagonal structure above is maintained for this block tridiagonal 
structure ll25ll . 

If the parameters are unknown, they must be estimated. Define ol x = [a x , . . . , a x "] T and (3 = [(3 1 , . . . , f3 Na ] T . 
To initialize, let (3 — median^^i^^)!.^, where corresponds to a column vector of length N p . To initialize a = 
[c?i, . . . , ajvj, the goal is to be able to represent the matrix, F, as the sum of only iV c time-varying components, also 
known as a low-rank approximation. Singular value decomposition is a tool known to find the low -rank approximation 
to a matrix, with the smallest mean-square-error of all possible low -rank approximations [29 1 . Because the "singular 



s t ~ Af(0, 1) (30) 
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values" are equivalent to the "principal components" of the covariance of the movie, a natural initial estimate for 
the N c a vectors are the N c first principal components. While other methods to initialize the spatial filters (such as 
"independent component analysis" [ 30 1 ) could also work, because fast algorithms for computing the first few principal 
components are readily available OIL PCA was both sufficiently effective and efficient. Given these initializations, 
estimating a. and f3 follows very similarly as in Eqs. d28l and d29l ): 



parameters, (3, concatenated T times. Convergence of parameters and spike trains in this model behaved similarly 
to the scenario described in section |231 assuming the spikes were sufficiently uncorrected and observations had a 
sufficiently high SNR. 

2.7 Experimental Methods 

2.7.1 Slice Preparation and Imaging 

All animal handling and experimentation was done according to the National Institutes of Health and local Institutional 
Animal Care and Use Committee guidelines. Somatosensory thalamocortical or coronal slices 350-400 /im thick were 
prepared from C57BL/6 mice at age P14 as described |32|. Neurons were filled with 50 /iM Oregon Green Bapta 1 
hexapotassium salt (OGB-1; Invitrogen, Carlsbad, CA) through the recording pipette or bulk loaded with Fura-2 AM 
(Invitrogen, Carlsbad, CA). Pipette solution contained 130 mM K-methylsulfate, 2 mM MgCl 2 , 0.6 mM EGTA, 10 
mM HEPES, 4 mM ATP-Mg, and 0.3 mM GTP-Tris, pH 7.2 (295 mOsm). After cells were fully loaded with dye, 
imaging was done by using a modified BX50-WI upright microscope (Olympus, Melville, NY). Image acquisition was 
performed with the C9100-12 CCD camera from Hamamatsu Photonics (Shizuoka, Japan) with arclamp illumination 
with excitation and emission bandpass filters at 480-500 nm and 510-550 nm, respectively (Chroma, Rockingham, 
VT) for Oregon Green. Imaging of Fura-2 loaded slices was performed with a confocal spinning disk (Solamere 
Technology Group, Salt Lake City, UT) and an Orca CCD camera from Hamamatsu Photonics (Shizuoka, Japan). 
Images were saved and analyzed using custom software written in Matlab (Mathworks, Natick, MA). 

2.7.2 Electrophysiology 

All recordings were made using the Multiclamp 700B amplifier (Molecular Devices, Sunnyvale, CA), digitized with 
National Instruments 6259 multichannel cards and recorded using custom software written using the Lab View platform 
(National Instruments, Austin, TX) . Square pulses of sufficient amplitude to yield the desired number of action 
potentials were given as current commands to the amplifier using the Lab View and National Instruments system. 

2.7.3 Fluorescence preprocessing 

Traces were extracted using custom Matlab scripts to (i) segment the mean image into ROIs, and then (ii) average all 
the pixels within the ROI. The Fura-2 fluorescence traces were inverted. As some slow drift was sometimes present in 
the traces, each trace was Fourier transformed, and all frequencies below 0.5 Hz were set to zero (0.5 Hz was chosen 
by eye), and the resulting fluorescence trace was then normalized to be between zero and one. 

3 Results 
3.1 Main Result 

The main result of this paper is that the fast filter can find the approximately most likely spike train, n, very efficiently, 
and that this approach yields more accurate spike train estimates than optimal linear deconvolution. Fig. [2] depicts 




(33) 




(34) 



where Eq. ( 1331 is solved efficiently in Matlab using a. 



(C + (3)\F X , where (3 is the set of estimated baseline 
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a simulation showing this result. Clearly, the fast filter's inferred "spike train" (third panel) more closely resembles 
the true spike train (second panel) than the optimal linear deconvolution's inferred spike train (bottom panel; Wiener 
filter). Note that neither filter results in an integer sequence, but rather, each infers a real number at each time. 

The Wiener filter implicitly approximates the Poisson spike rate with a Gaussian spike rate (see Appendix FBI for 

details). A Poisson spike rate indicates that in each frame, the number of possible spikes is an integer, 0, 1, 2, 

The Gaussian approximation, however, allows for any real number of spikes in each frame, including both partial 
spikes, e.g., 1.4, and negative spikes, e.g., -0.8. While a Gaussian well approximates a Poisson distribution when rates 
are about 10 spikes per frame, this example is very far from that regime, so the Gaussian approximation performs 
relatively poorly. More specifically, the Wiener filter exhibits a "ringing" effect. Whenever fluorescence drops rapidly, 
the most likely underlying signal is a proportional drop. Because the Wiener filter does not impose a non-negative 
constraint on the underlying signal (which, in this case, is a spike train), it infers such a drop. After such a drop has 
been inferred, since no corresponding drop occurred in the true underlying signal here, a complementary jump is often 
then inferred, to "re-align" the inferred signal with the observations. This oscillatory behavior results in poor inference 
quality. The non-negative constraint imposed by the fast filter prevents this because the underlying signal never drops 
below zero, so the complementary jump never occurs either. 

The inferred "spikes", however, are still not binary events when using the fast filter. This is a by-product of 
approximating the Poisson distribution on spikes with an exponential (cf. Eq. dl lab ), because the exponential is a 
continuous distribution, versus the Poisson, which is discrete. The height of each spike is therefore proportional to 
the inferred calcium jump size, and can be thought of as a proxy for the confidence with which the algorithm believes 
a spike occurred. Importantly, by utilizing the Gaussian elimination and interior-point methods, as described in the 
Methods section, the computational complexity of fast filter is the same as an efficient implementation of the Wiener 
filter. 

Although in Figure|2]the model parameters were provided, in the general case, the parameters are unknown, and 
must therefore be estimated from the observations (as described in section |2~4l >. Importantly, this algorithm does not 
require "training" data, i.e., there is no need for joint imaging and electrophyiological experiments to estimate the 
parameters governing the relationship between the two. Figure [3] shows another simulated example; in this example, 
however, the parameters are estimated from the observed fluorescence trace alone. Again, it is clear that the fast filter 
far outperforms the Wiener filter. 

Given the above two results, the fast filter was applied to real data. More specifically, by jointly recording elec- 
trophysiologically and imaging, the true spike times are known, and the accuracy of the two filters can be compared. 
Figure |4] shows a result typical of the 12 joint electrophysiological and imaging experiments conducted. Although it 
is difficult to see in this figure, the first four "events" are actually pairs of spikes, which is reflected by the width and 
height of the corresponding inferred spikes when using the fast filter. This suggests that although the scale of n is 
arbitrary, the fast filter can correctly ascertain the number of spikes within spike events. 

Figure [5] further evaluates this claim. While recording and imaging, the cell was forced to spike once, twice, or 
thrice, for each spiking event. Naively, this would suggest that an algorithm based on a purely linear model would 
struggle to resolve spike fidelity in this high frequency spiking regime. However, the fast filter infers the correct 
number of spikes in each event. On the contrary, there is no obvious way to count the number of spikes within each 
event when using the Wiener filter. 

3.2 Online analysis of spike trains using the fast filter 

A central aim for this work was the development of an algorithm that infers spikes fast enough to use online while 
imaging a large population of neurons (eg, as 100). Figure [6] shows a segment of the results of running the fast filter 
on 136 neurons, recorded simultaneously, as described in section l2?7l Note that the filtered fluorescence signals show 
fluctuations in spiking much more clearly than the unfiltered fluorescence trace. These spike trains were inferred in less 
than imaging time, meaning that one could infer spike trains for the past experiment while conducting the subsequent 
experiment. More specifically, a movie with 5,000 frames of 100 neurons can be analyzed in about ten seconds on a 
standard desktop computer. Thus, if that movie was recorded at 50 Hz, while collecting the data required 100 seconds, 
inferring spikes only required ten seconds, a ten-fold improvement over real-time. 
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Figure 2: The fast filter's inferred spike train is significantly more accurate than the output of the optimal linear 
deconvolution (Wiener filter) on typical simulated data. Note that neither filter constrains the inference to be a sequence 
of integers; rather, the fast filter relaxes the constraint to allow all non-negative numbers, and the Wiener filter allows 
for all real numbers. The restriction of the fast filter to exclude negative numbers eliminates the ringing effect seen 
in the Wiener filter output, resulting in a much cleaner inference. Note that the magnitude of the inferred spikes in 
the fast filter output is proportional to the inferred calcium jump size. Top panel: fluorescence trace. Second panel: 
spike train. Third panel: fast filter inference. Bottom panel: Wiener filter inference. Note that the gray bars in the 
bottom panel indicate negative spikes. Black '+'s in bottom two panels indicate true spike times. Simulation details: 
T w 3000 time steps, A = 5 msec, a = 1, f3 = 0, a = 0.3, r = 1 sec, A = 1 Hz. Conventions for other figures as 
above, unless otherwise indicated. 



3.3 Extensions 

Section lZTI describes a simple principled first-order model relating the spike trains to the fluorescence trace. A number 
of the simplifying assumptions can be straightforwardly relaxed, as described below. 

3.3.1 Replacing Gaussian observations with Poisson 

In the above, observations were assumed to have a Gaussian distribution. The statistics of photon emission and 
counting, however, suggest that a Poisson distribution would be more natural, especially for two-photon data [33], 
yielding: 

F t ~ Poisson(aC t + (3). (35) 

One additional advantage to this model over the Gaussian model, is that the variance parameter, a 2 , no longer exists, 
which might make learning the parameters simpler. Importantly, the log-posterior is still concave in C, as the prior 
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Figure 3: The fast filter significantly outperforms the Wiener filter, even when the parameters unknown. For both 
filters, the appropriate parameters were estimated using only the data shown above, unlike Figured in which the true 
parameters were provided to the filters. Simulation details as in Figure[2] 



remains unchanged, and the new log-likelihood term is a sum of terms concave in C: 

T T 

\ogP[F\C] =J2^gP[F t \C t ] = ^{F t log(aC t + /3) - (aC t + (3) - log(F t !)}. (36) 
t=i t=i 

The gradient and Hessian of the log-posterior can therefore be computed analytically by substituting the above like- 
lihood terms for those implied by Eq. (HJ. In practice, however, modifying the filter for this model extension did not 
seem to significantly improve inference results in any simulations or data (not shown). 

3.3.2 Allowing for a time-varying prior 

In Eq. (O, the rate of spiking is a constant. Often, additional knowledge about the experiment, including external 
stimuli, or other neurons spiking, can provide strong time-varying prior information 1 15 1. A simple model modification 
can incorporate that feature: 

n t l ~ Poisson(A t A), (37) 

where X t is now a function of time. Approximating this time-varying Poisson with a time-varying exponential with the 
same time-varying mean (similar to Eq. dl lab ), and letting A = [Ai, . . . , At] t A, yields an objective function identical 
to Eq. ( fTBT ), so log-concavity is maintained, and the same techniques may be applied. However, as above, this model 
extension did not yield any significantly improved filtering results (not shown). 
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Figure 4: The fast filter significantly outperforms the Wiener filter on typical in vitro data, using OGB- 1. Note that all 
the parameters for both filters were estimated only from the fluorescence data in the top panel (ie, not considering the 
voltage data at all). Again, '+'s denote true spike times extracted from the patch data, not inferred spike times from 
F. 



3.3.3 Saturating fluorescence 

Although all the above models assumed a linear relationship between F t and Ct, the relationship between fluorescence 
and calcium is typically better approximated by the nonlinear Hill equation |27|. Modifying Eq. (fl]i to reflect this 
change yields: 

F t = a— %- + + <re t , £ t ~AA(0,l). (38) 
C t + k d 

Importantly, log-concavity of the posterior is no longer guaranteed in this nonlinear model, meaning that converging 
to the global maximum is no longer guaranteed. Assuming a good initialization can be found, however, if this model 
is more accurate, then ascending the gradient for this model might yield improved inference results. In practice, 
initializing with the inference from the fast filter assuming a linear model (eg, Eq. (l30l l) often resulted in nearly equally 
accurate inference, but inference assuming the above nonlinearity was far less robust than the inference assuming the 
linear model (not shown). 



3.3.4 Using the fast filter to initialize the sequential Monte Carlo filter 

A sequential Monte Carlo (SMC) method to infer spike trains can incorporate this saturating nonlinearity, as well as 
the other model extensions discussed above 1 15 1 . However, this SMC filter is not nearly as computationally efficient 
as the fast filter proposed here. Like the fast filter, the SMC filter estimates the model parameters in a completely 
unsupervised fashion, i.e., from the fluorescence observations, using an expectation-maximization algorithm (which 
requires iterating between computing the expected value of the hidden variables — C and n — and updating the 
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Figure 5: The fast filter can often resolve the correct number of spikes within each spiking event, while imaging using 
OGB-1, given sufficiently high SNR. It is difficult, if not impossible, to count the number of spikes given the Wiener 
filter output. Recording and fitting parameters as in Figure H] Note that the parameters were estimated using a 60 sec 
long recording, of which only a fraction is shown here, to more clearly depict the number of spikes per event. 



paramters). In fl"5l , parameters for the SMC filter were initialized based on other data. While effective, this initializa- 
tion was often far from the final estimates, and therefore, required a relatively large number of iterations (eg, 20-25) 
before converging. Thus, it seemed that the fast filter could be used to obtain an improvement to the initial parameter 
estimates, given an appropriate rescaling to account for the nonlinearity, thereby reducing the required number of it- 
erations to convergence. Indeed, Figure [7] shows how the SMC filter outperforms the fast filter on biological data, and 
only required 3-5 iterations to converge on this data, given the initialization from the fast filter (which was typical). 
Note that the first few events of the spike train are individual spikes, resulting in relatively small fluorescence fluctu- 
ations, whereas the next events are actually spike doublets or triplets, causing a much larger fluorescence fluctuation. 
Only the SMC filter picks up the individual spikes in this trace, a result typical when the effective signal-to-noise 
ratio (SNR) is poor. Thus, these two inference algorithms are complementary: the fast filter can be used for rapid, 
online inference, and for initializing the SMC filter, which can then be used to further refine the spike train estimate. 
Importantly, although the SMC filter often outperforms the fast filter, the fast filter is more robust, meaning that it more 
often works "out-of-the-box." 



3.4 Spatial filter 

In the above, the filters operated on one-dimensional fluorescence traces. Typically, the data are time-series of images 
which are first segmented into regions-of-interest (ROI), and then (usually) averaged to obtain F t . In theory, one could 
improve the effective SNR of the fluorescence trace by scaling each pixel relative to one another. In particular, pixels 
not containing any information about calcium fluctuations can be ignored, and pixels that are partially anti-correlated 
with one another could have weights with opposing signs. 

Figure [8] demonstrates the potential utility of this approach. The top row shows different depictions of an ROI 
containing a single neuron. On the far left panel is the true spatial filter for this neuron. This particular spatial filter 
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Figure 6: The fast filter infers spike trains from a large population of neurons imaged simultaneously in vitro, using 
Fura-2, faster than real-time. Specifically, inferring the spike trains from this 400 sec long movie including 1 36 neurons 
requires only about 40 sec on a standard laptop computer. The inferred spike trains much more clearly convey neural 
activity than the raw fluorescence traces. Although no intracellular "ground truth" is available on this population data, 
the noise seems to be reduced, consistent with the other examples with ground truth. Left panel: Mean image field, 
segmented into ROIs each containing a single neuron. Middle panel: example fluorescence traces. Right panel: fast 
filter output corresponding to each associated trace. Note that neuron identity is indicated by color across the three 
panels. 



was chosen based on experience analyzing both in vitro and in vivo movies; often, it seems that the pixels immediately 
around the soma are anti-correlated with those in the soma. This effect is possibly due to the influx of calcium from the 
extracellular space immediately around the soma. The simulated movie (not shown) is relatively noisy, as indicated by 
the second panel, which depicts an exemplary image frame. The standard approach, given such a noisy movie, would 
be to first segment the movie to find an ROI corresponding to the soma of this cell, and then spatially average all the 
pixels found to be within this ROI. The third panel shows this standard "boxcar spatial filter". The fourth panel shows 
the mean frame. Clearly, this mean frame is very similar to the true spatial filter. 

The bottom panels of Figure [8] depict the effect of using the true spatial filter, versus the typical one. The left 
side shows the fluorescence trace and its associated spike inference obtained from using the typical spatial filter. The 
right side shows the same when using the true spatial filter. Clearly, the true spatial filter results in a much cleaner 
fluorescence trace and spike inference. When the true spatial filter is a single Gaussian, the boxcar spatial filter works 
about as well as the true spatial filter (not shown). 

3.5 Overlapping spatial filters 

The above shows that if a ROI contains only a single neuron, the effective SNR can be enhanced by spatially filter- 
ing. However, this analysis assumes that only a single neuron is in the ROI. Often, ROIs are overlapping, or nearly 
overlapping, making the segmentation problem more difficult. Therefore, it is desirable to have an ability to crudely 
segment, yielding only a few neurons in each ROI, and then spatially filter within each ROI to pick out the spike trains 
from each neuron. This may be achieved in a principled manner by generalizing the model as described in section lZSl 
Figure|9]shows how this approach can separate the two signals, assuming that the spatial filters of the two neurons are 
known. 

Typically, the true spatial filters of the neurons in the ROI will be unknown, and thus, must be estimated from 
the data. This problem may be considered a special case of blind source separation 041 1301 . Figure [TOl shows that 
multiple signals can be separated, with reasonable assumptions on correlations between the signals, and SNR. Note 
that separation occurs even though the signal is overlapping in several pixels (top left panel), leading to a "bleed- 
through" effect in the one-dimensional fluorescence projections (bottom left panel). The inferred filters (top middle 
and right panels) are not the true filters, but rather, their sum is linearly related to the sum of the true filters. Regardless, 
the inferred spike trains are well separated (bottom middle and right panels). 
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Figure 7: The fast filter effectively initializes the parameters for the SMC filter, significantly reducing the number of 
expectation-maximization iterations to convergence, on typical in vitro data, using OGB- 1. Note that while the fast 
filter clearly infers the spiking events in the end of the trace, those in the beginning of the trace are less clear. On the 
other hand, the SMC filter more clearly separates non-spiking activity from true spikes. Also note that the ordinate on 
the bottom panel corresponds to the inferred probability of a spike having occurred in each frame. 



4 Discussion 



This work describes an algorithm that approximates the maximum a posteriori (MAP) spike train, given a calcium 
fluorescence movie. The approximation is required because finding the actual MAP estimate is not currently compu- 
tationally tractable. Replacing the assumed Poisson distribution on spikes with an exponential distribution yields a 
log-concave optimization problem, which can be solved using standard gradient ascent techniques (such as Newton- 
Raphson). This exponential distribution has an advantage over a Gaussian distribution by restricting spikes to be 
positive, which improves inference quality (cf. Figure 0), and is a better approximation to a Poisson distribution with 
low rate. Furthermore, by utilizing the special banded structure of the Hessian matrix of the log-posterior, this approx- 
imate MAP spike train can be inferred fast enough on standard computers to use it for online analyses. Finally, all the 
parameters can be estimated from only the fluorescence observations, obviating the need for joint electrophysiology 
and imaging (cf. Figure 0. This approach is robust, in that it works "out-of-the-box" on all the in vivo and in vitro 
data analyzed (cf. Figure!!}. 

Ideally, one could compute the full joint posterior of entire spike trains, conditioned on the fluorescence data. This 
distribution is analytically intractable, due to the Poisson assumption on spike trains. A Bayesian approach could 
use Markov Chain Monte Carlo methods to recursively sample spikes until a whole sample spike train is obtained 
Il35ll36ll . Because a central aim here was computational expediency, a "greedy" approach is natural: i.e., recursively 
sample the most likely spike, update the posterior, and repeat until the posterior stops increasing. Template matching, 
projection pursuit regression [37|, and matching pursuit ll38l are examples of such a greedy approach (Greenberg et 
al's algorithm lfl2l could also be considered a special case of such a greedy approach). Both the greedy methods, and 
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Figure 8: A simulation demonstrating that using a better spatial filter can significantly enhance the effective SNR. 
The true spatial filter was a difference of Gaussians: a positively weighted Gaussian of small width, and a negatively 
weighted Gaussian with larger width (both with the same center). Top row far left: true spatial filter. Top row second 
from left: example movie frame. Top row second from right: typical spatial filter. Top row far right: mean frame. 
Middle row left: fluorescence trace using the boxcar spatial filter. Bottom row left: fast filter output using the boxcar 
spatial filter. Middle row right: fluorescence trace using true spatial filter. Bottom right: fast filter output using 
true spatial filter. Simulation details: a — Af(0, 21) — l.UV(0, 2.5/) where Af(fi, S) indicates a two-dimensional 
Gaussian with mean fi and covariance matrix S, (3 = 1, r = 0.85 sec, A = 5 Hz. 



the one developed here, aim to optimize a similar objective function. While greedy methods reduce the computational 
burden by restricting the search space of spike trains, here analytic approximations are made. The advantage of the 
greedy approaches relative to this one is that they result in a spike train (ie, a binary sequence). However, because 
of the numerical approximations and restrictions, one can never be sure whether the algorithm finds the most likely 
possible spike train. On the other hand, the approach developed herein is guaranteed to quickly find the most likely 
spike "train", but now the inferred spike train allows for partial spikes. One interesting future direction might be to 
explore whether greedy methods could be improved by initializing with a thresholded version of the fast filter output. 

Further, the fast filter is based on a biophysical model capturing key features of the data, and may therefore be 
straightforwardly generalized in several ways to improve accuracy. Unfortunately, some of these generalizations do 
not improve inference accuracy, perhaps because of the exponential approximation. Instead, the fast filter output can 
be used to initialize the more general SMC filter fl5l . to further improve inference quality (cf. Figure [7J. Another 
model generalization allows incorporation of spatial filtering of the raw movie into this approach (cf. Figure[8]l. 

A number of extensions follow from this work. First, pairing this filter with a crude but automatic segmentation 
tool to obtain ROIs would create a completely automatic algorithm that converts raw movies of populations of neurons 
into populations of spike trains. Second, combining this algorithm with recently developed connectivity inference 
algorithms on this kind of data [36|, could yield very efficient connectivity inference. 
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Figure 9: Simulation showing that even when two neurons' spatial filters are overlapping, one can separate the two 
spike trains by spatial filtering. Top left panel: mean frame from the movie. Bottom left: optimal one-dimensional 
fluorescence projections for the neuron 1 (black line) and neuron 2 (gray line), and their respective spike trains (black 
and gray '+' symbols, respectively). Top middle panel: the true spatial filter for neuron 1. Bottom middle panel: 
inferred (black line) and true (black '+' symbols) spike trains. Top right panel: the true spatial filter for neuron 
2. Bottom right panel: inferred (gray line) and true (gray '+' symbols) spike trains. Simulation details: a 1 = 
W([-1.8, 1.8] T , 21) a 2 = W([1.8, -1.8] T , 57), f3 = [1, 1] T , r = [0.5, 0.5] T sec, A = [1.5, 1.5] T Hz. 
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A Pseudocode 



B Wiener Filter 

The Poisson distribution in Eq. (0| can be replaced with a Gaussian instead of a Poisson distribution, ie, i%t 
Af(\A, AA), which, when plugged into Eq. (O yields: 



T 



argmax^ [^(Ft - a(C t + /3)) 2 + -Lfa - AA) 2 j . (39) 

Note that since fluorescence integrates over A, it makes sense that the mean scales with A. Further, since the Gaus- 
sian here is approximating a Poisson with high rate 11331 . the variance should scale with the mean. Using the same 
tridiagonal trick as above, Eq. (II lbb can be solved using Newton-Raphson once (because this expression is quadratic 
in n). Writing the above in matrix notation and substituting Ct — jCt-i for nt, yields: 

C = argmax-^ \\F - C\\ 2 - -L \\MC - AAl|| 2 , (40) 
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Figure 10: Simulation showing that even when two neurons' spatial filters are largely overlapping, spatial filters that 
together are linearly related to the true spatial filters can be inferred, to separate the two signals. Simulation details as 
above. Note that the spatial fields are sufficiently overlapping to cause significant "bleed-through" between the two 
signals. In particular, this is clear from the rise in the black line in the first few seconds of the bottom left panel, which 
should be fluorescence due to neuron 1, but is in fact due to spiking from neuron 2. Regardless, the spike trains are 
accurately inferred here. Simulation parameters as in Figure|9] 

Algorithm 1 Pseudocode for inferring the approximately most likely spike train, given fluorescence data. Note that 
£i <C 1 for i € {1, 2}; the algorithm is robust to small variations in each. The equations listed below refer to the most 
general equations in the text (simpler equations could be substituted when appropriate). Curly brackets, { •}, indicate 
comments. 

1: initialize parameters, 9 (section l2.4.1l ) 

2: while convergence criteria not met do 

3: for z = 1, 0.1, 0.01, . . do {interior point method to find C} 

4: Initialize n t = 6 for all t = 1, . . . , T, C x = and C t = jC t -i + n t for all f = 2, . . . , T. 
5: let C z be the initialized calcium, and V z , be the posterior given this initialization 
6: while V z > < V z do {Newton-Raphson with backtracking line searches} 
7: compute g using Eq. d26i > 

8: compute H using Eq. (|27| > 

9: compute d using H\g {(block-) tridiagonal Gaussian elimination} 

10: let C 7 j = C z + sd, where s is between and 1, and V z > > V z {backtracking line search} 

li: end while 
12: end for 

13: check convergence criteria 

14: update a using Eq. ( l33l l {only if spatial filtering} 

15: update (3 using Eq. d34i > 

16: let a be the root-mean square of the residual 

17: let A = ^ J2t 

18: end while 



which is quadratic in C. The gradient and Hessian are given by: 



g = -^(C-F)^^-((MC) T M + XAM T l), (41) 

H = h I + JK MTM -v (42) 
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Note that this solution is the optimal linear solution, under the assumption that spikes follow a Gaussian distribution, 
and is often referred to as the Wiener filter, regression with a smoothing prior, or ridge regression |24|. Estimating the 
parameters for this model follows similarly as described in section l2~4l 
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